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Abstract 

We study properties of Fisher distribution (von Mises-Fisher distribution, matrix 
Langevin distribution) on the rotation group SO (3). In particular we apply the 
c/3 , holonomic gradient descent, introduced by Nakayama et al. (2011), and a method 

of series expansion for evaluating the normalizing constant of the distribution and for 
computing the maximum likelihood estimate. The rotation group can be identified 
with the Stiefel manifold of two orthonormal vectors. Therefore from the viewpoint 
\ of statistical modeling, it is of interest to compare Fisher distributions on these 

■ manifolds. We illustrate the difference with an example of near-earth objects data. 

Keywords: algebraic statistics; directional statistics; holonomic gradient descent; 
maximum likelihood estimation; rotation group. 

1 Introduction 



o 
d 



In thi s paper we apply the holonomic gradient descent (HGD) introduced in lNakayama et al 



- 1—1 

X 

20111 ] and a method of series expansion for evaluating the normalizing constant of Fisher 
distribution on the rotation group and on Stiefel manifolds and for obtaining the maxi- 
mum likelihood estimate. Fisher distribution is the most basic exponential family model 
for these manifolds. 



The general theory of exponential family is well established (e.g. iBarndorff-Nielsen 



19781 ] ) . In nice "textbook" cases, the normalizing constant of the exponential family (i.e. 



its cumulant generating function) can be explicitly evaluated and then the calculation of 



* Department of Mathematics, Keio University 
tJST, CREST 

■^Department of Mathematical Informatics, Graduate School of Information Science and Technology, 
University of Tokyo 

§ Faculty of Mathematics and Physics, Kanazawa University 
^Department of Mathematics, Kobe University 



1 



maximum likelihood estimate is also simple. However in general, the integral defining the 
normalizing constant of an exponential family can not be explicitly evaluated. Various 
techniques, such as infinite series expansion, numerical integration, Markov chain Monte 
Carlo methods, iterative proportional scaling, have been applied for these cases. 

Recently, we introduced a very novel approach, the holonomic gradi ent descent, for 
evalua tion of the normalizing constant and solving the likelihood equation (jNakayama et al 
20 1 1| ) . Our approach provides a systematic methodology for these tasks. Note that the 



normalizing constant is a definite integral over the sample space, where the integrand 
contains the parameter of the family of distributions. The likelihood equation involves dif- 
ferentiation of the normalizing constant with respect to the parameter. In the holonomic 
gradient descent, the theory of D-modules is used to derive a set of partial differential 
equations satisfied by the normalizing constant. 

In this paper we apply the holonomic gradient descent and a method of series expansion 
to Fisher distribution on the rotation group 50(3) and on the Stiefel manifold V^IR 3 ) 
of two orthonormal vectors. The Fisher distribution on Stiefel manifolds and or thogonal 
group s has been st udied by number of authors. However only a few papers ([Prentice 



1986l | . [Wood! [19931 ]) study the Fisher distribution on the special orthogonal group SO(p). 



The holonomic gradient descent needs the initial value for the differential equation. 
To evaluate this value, we develop an explicit formula of the infinite series expansion 
for 50(3). A n alternative method is a one- dimensional integration formula proposed by 



Wood! [1993J . In Figure [TJ we illustrate a diagram that clarifies the difference between 
HGD and direct use of gradient descent method. These variations will make the numerical 
evaluation of the maximum likelihood estimator more flexible. 



HGD 

/\ 



(O) _* (1) _^@(2) 



0(0) _* (1) _* (2) 



Subroutine: 
Computing c(0) etc. 
(initial value for HGD) 

(a) holonomic gradient descent. 



Subroutine: 

Computing c(0), Vc(O). 



(b) usual gradient descent. 



Figure 1: The difference between HGD and the usual gradient descent. In the diagram 
(a), a subroutine that computes the normalizing constant c(0) (and related values) is 
called only once. In the diagram (b), the subroutine is called every time the parameter 
is updated. 

The organization of the paper is as follows. For the rest of this section we set up nota- 
tion and summarize preliminary facts on special orthogonal groups and Stiefel manifolds. 
In Section |2] we derive some properties of Fisher distribution on special orthogonal groups 
and Stiefel manifolds. In Section [3] we derive the set of partial differential equations sat- 



2 



isfied by the normalizing constant (Section I3.ip . We also give an infinite series expansion 
for the normalizing constant (Section 13. 2p . In Section H] we apply the results of previous 
sections to the data on orbits of near-earth objects. 



1.1 Notation and preliminary facts 

Here we set up notation of this paper and summarize some preliminary facts. Although 
we are primarily interested in 3 x 3 matrices for practical and computational reasons, we 
set up our notation for general dimension. Let 

V r {R p ) = {Ae R pxr | A T A = I r } (0 < r < p) 

denote the Stiefel manifold of p x r real matrices with orthonormal columns, where R pxr 
denotes the set ofpxr real matrices and A T denote the transpose of A. In particular for 

r = p, 

V r {W) = 0( P ) 

is the set of p x p orthogonal matrices. 

SO(p) = {X G 0(p) | detX = 1} 

denotes the special orthogonal group. 
The total volume of V r (M. p ) is given as 

or rp/2 

Vol{V r {W)) 



r r (p/2)' 
where 

r r ( a )=7r^- 1 )/*JJr[a-i(t-i)]. 



2 

i=i 



See Theorem 2.1.12 of lMuirheadl [1982 . 

Let Vol denote the invariant measure (volume element) on V r (M. p ) and let 

^ = VoK^M) Vol( ' } 

denote the invariant probability measure on V^,(1R P ). Similarly for SO(p), by 
Vol(-)/Vol(iS'0(p)) we denote the invariant measure with 

Vo\(SO(p)) = ivdl(0(p)). 

For a p x r matrix G M. pxr , r < p, let 

6 = QDR, Q G V r (W), D : diagonal, R G 0{r) 
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denote its singular value decomposition (SVD). In this usual SVD, the diagonal elements 
of D are taken to be non-negative. Now let e M. pxp be a square matrix and restrict 
Q,R to be in SO(p). Then the sign of det has to be equal to the sign of detD. Let 
Pi, • • • , Pp > denote the singular values of 0. For non-singular 0, the sign of det D can 
be adjusted by multiplying pi by e = ±1. Therefore we can write 

= QDR, Q,R e SO(p), D = di&g(epi, p 2 , . . ■ , Pp), e = sgndet0. (1) 

We call this decomposition the sign-preserving SVD of with respect to SO(p). We also 
call 0i = epx, 



"1. Fl, " " 

Prentice 


1986 


and 


Wood 1993 



2 Fisher distributions on V r (MP) and SO(p) 

In this section we consider Fisher distribution on V r (M. p ) and SO(p). In particular we 
clarify the difference between Fisher distributions on V^_i(IR p ) and SO(p). Basic facts on 
Fisher distribution on V^.(IR P ) is summarized in Chapter 13 of Mardia and Jupp 2000l . 

Let X denote either V r (M. p ) or SO(p). The density of the Fisher distribution on X 
with respect to the uniform probability measure p is given by 



/(*;©) 



c(0) 



etr(0 T X), X EX, 



where 



G R pxr is the parameter matrix, etr(-) = exp(tr(-)), and 



c(0) 



etr(G T X)p(dX) 



(2) 



x 



(e.g. 



Khatri and Mardia 1977 



is the normalizing constan t . For V r (M. p ) it is well known 

Muirheadl |l982) . Ichikusel |200d |) that c(0) is a matrix-valued hypergeometric function 
c(0) = F 1 (p/2,Y), where Y = T 0/4. However properties of c(0) for the special 
orthogonal gro up X = SO(p) a re not studie d in detail. For the case of 5*0(3), following 



the approach in lPrenticd [1986| . IWoodl |1993| used the correspondence between the Fisher 
distribution on SO (3) and the Bingham distribution on the unit sphere S 3 in M 4 and 
showed that c(0) can be written as a one-dimensional integral involving the modified 
Bessel function of degree zero. In Section [3] we derive differential equations and an infinite 
series expansion of c(0) for 50(3). 

Since x p is uniquely determined from 



x 



1; 



Let Xi, . . . , x p be the columns of X e SO(p). 
. . . , x v _\, we can identify SO(p) with V p _i(W p ) 



by 



[xi, ...,Xp)e SO(p) <rt (asi, 



e Vr, 



(3) 



This leads to the question of differences of Fisher distributions on SO(p) and those on 
V^,_i(IR p ). Let = (0i, . . . , Op) G M. pxp be a parameter matrix for Fisher distribution on 



SO(p). By setting p = 0, we clearly obtain a Fisher distribution on V^_i(M p ). Hence 
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the family of Fisher distributions on V p ~i(M. p ) is a submodel of the family of Fisher 
distributions on SO(p). It can be easily seen that for p = 2, 6 2 is redundant and these 
two families are the same. However for p > 3, the family of Fisher distributions on 
V p _i(M. p ) is a strict submodel of that on SO(p). We state this as a lemma. 

Lemma 1. For p > 3, the family of Fisher distributions on V p -i(M. p ) is a strict submodel 
of that on SO(p) . 

Proof. In general, let K be a positive integer and consider a i^-dimensional exponential 
family 

p(x\8) = ^yexp (6 T x) (x E S), C(9) = J exp(6 T x)v(dx), 

where 9 is a K- dimensional vector, 5* is a smooth submanifold of M, K and v is a measure 
on S. Assume that C(8) exists in some open neighborhood of the origin 6 = 0. The 
parameter 9 is estimable if and only if 

Affine(support(z/)) = R K 

(Corollary 9.6 of B arndorff- Nielsen jl978 |). where support(zz) is the support of v and 



Affme(C/), U C R K , denotes the affine hull of U. 

We now show that if p > 3 then Affine(SO(p)) = M pxp and the distribution p(X\Q) oc 
etr(G T X) is estimable, which is sufficient to prove the lemma. Let L = Affine (SO(p)). 
We first see that the zero matrix belongs to L. Let q G { — 1, 1} for 1 < i < p. 
Then the average of 2 P_1 matrices diag(ei, . . . , e p _i, nj=i e «) m SO(p) * s zero. Hence 
G L. We now show that e^ej belongs to L (Vz, j), where ej = (0, . . . , 1, . . . , 0) T is the 
standard basis vector with 1 as the i-th element. Then together with G L it follows that 
L = W xp . Take matrices Pi G SO(p) (i = 1, . . . ,p) such that P^e; = e±. For example, 
let Pi = I p and P« = e\ej — eiej + J2j^n e j e J f° r i ^ 1- Then e^ej G L if and only if 
eiej = Pitit^Pj G L. Now it suffices to show that t\t~[ belongs to L. Take the average 
of 2 P ~ 2 matrices diag(l, ■ ■ ■ , e p _i, Yl^=2 e «)- Then e\ej G L. □ 

For X = V r (M. p ) the maximum likeli hood estimate (MLE) of t he Fisher distribution 



is obtained by the following procedure ( iKhatri and Mardial 19771 ]) . Let . . . 



be a data set on V r (R p ). Let X = N' 1 J2t=i x[t) be the sample mean matrix and let 
X = Qdiag(c/i, . . .,g r )R be the SVD of X, where Q G V r iW), R G 0{r) and g x > > 
g r > 0. Then the maximum likelihood estimate is given by G = Q diag(0i, . . . , 4> r )R, 
where 0, is the solution of 



This procedure is also valid for SO(p) if we use the sign-preserving SVD in ([T]). We 
give the fact as a lemma since it is not explicitly proved in the literature. Remark that for 
SO(p) the normalizing constant c(G) in (T5]) is invariant under a transformation i— > QQR 
for any Q,R G SO(p). 
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Lemma 2. Let XW,...,!^ be a data set on SO(p). Let X = N" 1 J2t =1 X® be the 
sample mean matrix and X = Q diag(g 1; . . . , g p )R be the sign-preserving SVD of X , where 
Q,R G SO(p) and \gi\ > g 2 ■ ■ ■ > g p > 0. Then the maximum likelihood estimate of the 
Fisher distribution on SO(p) is = Qdiag(0i, . . . ,<j) r )R, where <pi is the maximizer of 
the function 

p 

(0fc)fc=l ^ X^ fcfl,fc ~ lo g C ( dia g(^l' ■■■At)), (4) 
k=l 

or equivalently, the solution of 

y^?_i dhXhk)u(dX) 

U, i = l,...,p. (5) 



IsO(p) Xii eX P(ELl 4>kXkk)K dX ) 

Iso(p) ex P(ELi 4>kXkk)v(dX) 



Proof. We change the parameter variable from G to $ = (</>y)i j=i = Q T QR T . Then the 
(1/iV times) log likelihood function is written as 

tr(6 T X) - logc(e) = tr($ T G) - logc($), (6) 

where G = diag(gi, . . . , g p ). Since ([6]) is strictly convex in $, the unique maximizer makes 
its first-order derivatives zero. Note that the first term on the right hand side of ([6]) does 
not depend on the off-diagonal elements of <3>. Therefore the condition for maximization 
of ([6]) with respect to an off-diagonal element is written as 

= -^-logc($), {i^j). (7) 

0<Pij 

We now fix i ^ j and evaluate (d/dfaj) logc($) at (0^/)^^ = 0. Then we have 

IsO(p) "^W 6Xp 



dcf) 



d 

logc($) 



, ,=o W^f Jso( P ) 



Iso(v) ex P(ELi 0kkX kk )v(dX) 



However 



Xij 



v p p 

exp(y2<f) kk x kk )n{dX) = / (-x i:j )exp(y2(f) kk x kk )fi(dX) = 

JsoM 



SO(p) k=1 JSO(p) k=1 

because the uniform distribution /i on SO(p) is invariant with respect to multiplication of 
— 1 to the z-th row and the i-th column of X. Therefore any diagonal matrix $ satisfies 
(J7J). The log-likelihood function of the diagonal matrix is @ and the maximizer satisfies 
©. □ 

When detX < 0, it is not correct to use the ordinary singular values of X on the 
right-hand side of (JSJ). 
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Remark 1. The determinant of the sample mean matrix X is not necessarily positive 
even if all X®, t = l,...,N, are in SO(p). Indeed for the case of uniform distribution 
on SO(p) we prove 

P(detX < 0) (N^oo), 

as long as p > 3. By the central limit theorem y/N(X — E(X)) converges to a Gaussian 
random matrix Z with the same covariances as X. We will show E(X) = and the 
covariances of X are diagonal when p > 3. Then Z and any sign change of a column 
of Z have the same probability distribution and therefore the probability of det(Z) < is 
1/2. Hence the probability of det(X) < converges to 1/2. 

To prove that the mean is zero and the covariance is diagonal, it is sufficient to consider 
E(Xn) and E(X a iXi, 2 ) (1 < a, b < p) by symmetry. Define a random matrix Y by 
Yij = Xij for j ^ 1,3 and Yy = —Xij for j = 1,3. Since both X and Y have the 
uniform distribution on SO(p), we deduce that E(Xn) = E(—Xu) = 0, E(X a iXb 2 ) = 
E(-X al X b2 )=0. 

Remark 2. Even if detX > 0, the determinant of the estimated parameter © may be 
negative. Indeed, let the sign-preserving singular values of X and be g = (gi, g 2 , ^3) 
and (f> = (</>i, 02, ^3), respectively. We prove that g\g 2 gz an d 010203 can have the opposite 
signs. To see this, we first consider the case 0i = 0, 2 > and 3 > 0. Then, by using 
the Taylor expansion formula developed in Subsection \3.2L we deduce that g\, g 2 and g% 
are strictly positive. By continuity, there exist some 0i < 0, 02 > and 03 > while all 
gi 's are positive. 



3 Computation of the normalizing constant and its 
derivatives 

For computing the maximum likelihood estimate of Fisher distribution we need numerical 
evaluation of the normalizing constant c(0) of ([2]) and its derivatives. In this section we 
study two methods for this purpose. The first method is the holonomic gradient descent. 
In the second method, we use series expansion of etr(0 T X). The second method is also 
used to compute the initial value of HGD (see Figured] (a)). 

3.1 The holonomic gradient descent for Stiefel manifolds and 
special orthogonal group 



Let us briefly describe the holonomic gradient descent. As to details, we refer to lNakayama et al 



201 1| . An algebraic computation is the first step; we construct linear ODE's (ordinary 
differential equations) satisfied by c(0) with respect to each % by Grobner bases of a set 
of partial differential equations satisfied by c. Variables other than 0y appear as param- 
eters in the ODE. The rank of ODE's is called the holonomic rank. The ODE's give a 
dynamical system for the function c(0) etr(— T A), the reciprocal of the likelihood. The 
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gradient of the function can also be expressed in terms of derivatives of the reciprocal 
standing for the standard monomials and X. The second step is a numerical procedure; 
a point in the dynamical system moves toward the maximum likelihood estimate along 
the gradient direction, simultaneously updating the values of c(0) and its derivatives. 

For the holonomic gradient descent, we study differential operators A annihilating 
c(9), that is, A ■ c(0) = 0. Denote the differential operator d/d6ij by dij. We first study 
the special orthogonal group and then study the Stiefel manifold. 



3.1.1 The case of special orthogonal group 

Let G IR pxp . We consider the following three types of differential operators: 
p p 
A}f = Y dikdjk - Sij, A\f = Y dkid k j - Sij (i < j), 

A {2) = det(<%) - 1, 
p 

A f = £ (- jk9 ik + dikdjk) , Af = (-OkAi + Okidkj) (i < J), 
k=i fc=i 
where Sg is the Krone cker's delta. The following lemma is an analogy of Theorem 2 of 



k=l k=l 



p p 



Nakayama et al.l 2011 



Lemma 3. The above differential operators annihilate c(0) of SO(p). 

Proof. We first prove that the operators A\j , A\j and A^ annihilate etr(0 T X) for any 
X G SO(p). Then they also annihilates c(0) because A-c(Q) = f so ^ A ■ etr(0 T X) fi(dX) 
for any operator A. Since ■ etr(0 T X) = Xjj-etr(0 T X) and XX T = I, we have 

A$ ■ etr(0 T X) = (Y^x ik x jk - Si-\ etr(0 T X) = 0. 

Similarly, we obtain A[f ■ etr(0 T X) = from X T X = I and A^ ■ etr(0 T X) = from 

det(X) = 1. Next consider A\f and Af . We note c(0) = c(Q0) = c(0Q) for any 
Q G SO(p). For any fixed i < j, define a rotation matrix Q = Q(e) by 

Q = (cos e)(Eu + Ejj) + (sme)(-Eij + EjA + ^ E kk , 

k^i,j 

where E k i is the matrix whose (i, j)-th component is 1 if k — i and / = j and otherwise. 
Then 

= c(Q0) - c(0) 



c ( ~ e Yl °i kEik + e Yl 9ikE i k + °( e ) ) 

V k k J 

P 

e Y(~ e jk d ik + dikdjk) ■ c(0) + o(e), 



c(0) 



fc=i 



S 



as e — > 0. Hence we have A® • c(0) = 

c (eg) = c(6). 



Similarly we obtain Af) ■ c(0) 



from 
□ 



Let D be the ring of differential operators with polynomial coefficients in 6^- and let I 



(i) 



denote the ideal generated by the above differential operators A\- 

,0 



, A\f in L>. Also let 

Jdiag denote I restricted to diagonal matrices 9 = diag(0n, . . . , 9 pp ). I ■ /(O) = implies 
-^diag ■ /(diag(O)) = 0. We denote by R p the ring of differential operators with rational 
function coefficients in 0jj, 1 < i, j < p. 

The following propo sition is necessary for the holonomic gradient descent. We refer to 
Nakayama et al.l 20111 for the definition of holonomic ideals in D and zero- dimensional 



ideals in R p . Once zero-dimensionality of R P I is proved and a Grobner basis is constructed, 
we can find ODE's and apply the holonomic gradient descent for the maximum likelihood 
estimate. 

Proposition 1. If p = 2, then the ideal I is holonomic. In particular, the ideal R2I is 
zero- dimensional. The holonomic rank is equal to 2. 



The propo sition is proved by Macaula y2 (IGrayson and Stilfmanl ) and the yang package 
on Risa/Asir (IRisaAsir developing team! ) by utilizing Grobner basis computations in rings 
of differential operators. Also the set of generators of / is obtained by nk_restriction 
function of asir from the integral representation of c(G) as 



■$21 



I" 022, #3 
'12) 022 5 



9l = -012 - 021, 92 

g 4 = (0 22 + e n )d 21 + ( 

#5 = (6*21 - 6*12)022021 + 

96 — ( — #21 + ^12)021 + (091 — 20 12 021 + 



2 2 i + 



22 



?22 + 01l)022 + 022 — 022 ~ 011, 



21 — ^12C21 T ^22 
+ (022 + 01l)022 — 022 ~ 201X022 ~ 011- 

Furthermore the set of generators of /diag is given as 

hi = (—022 — 01l)011 — 011 + 022 + 011, 



2011022 + 011 + 012)0 



22 



011 + 



22- 



Proposition 2. If p = 3, then the ideal R 3 I is zero- dimensional. The holonomic rank is 
less than or equal to 4. R 3 /(R 3 I) is spanned by 1,031,032,033 as a vector space over the 
field of rational functions. 

The proposition is proved by a large scale c omputation on R i sa/As ir with Grobner 
bases. The algorithm for it is explained in, e.g., iNakavama et al.l 2011 1 . Programs an d 
obtained data are at the website OpenXM/Math (jOpenXM Mathematics Repository! ) . 
We conjecture that / is holonomic and consequently R P I is zero-dimensional for any p in 
the case of SO{p). 
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3.1.2 The case of Stiefel manifold 



Let G M. pxr (r < p). Consider the following differential operators: 

v 

^ d ki d k j - Sij (1 < i < j < r), 



k=l 



A^f — {—djkdik + dikdjk) 

k=l 

p 

Af) = ^ {—dkjdki + dkidkj) 



(l<i<j <p), 
(1 < i < j < r). 



k=l 



Lemma 4. The above operators annihilate c(G) ofV r < 



Proof. The proof is similar to that of Lemma [31 The operator annihilates etr(0 T X) 



if X e V r (W). Since c(9) = c(QQ) = c(QR) for any Q e 0(p) and 6 O(r), we have 



Af) ■ c(6) = and ■ c(9) = 0, respectively. 



□ 



Let I denote the ideal generated by the above operators and let /diag denote its re- 
striction to diagonal matrices 9 = diag(#n, . . . , 8 rr ) G M. pxr . We denote by R rtP the ring 
of differential operators with rational function coefficients in 6^, 1 < % < p, 1 < j < r. 

Proposition 3. Ifr = 2,p = 3, then the ideal i?2,3^ zero-dimensional. The holonomic 
rank is equal to 4. ^2,3/(^2,3-^) spanned by 1,811,812,8^ over the field of rational 
functions. 

This proposition is also proved by a computatio n on Risa/Asir. Programs to verif y 
the proposition are at the website OpenXM/Math ( lOpenXM Mathematics Repositoryl ) . 
We conjecture that I is holonomic and consequently R ryP I is zero- dimensional for any r 
and p in the case of K(^ p )- 

We close this subsection with some notes on our result and a study of hypergeo- 
metric functions. For the matrix-valued hypergeometric function c( Q) = nF-\(yl2,Y) 
Y = 6 T 0/4, the following partial differential equation is well known f Muirheadl |l970 



Muirheadl . Il982l Thm.7.5.5]). Let yi, . . . ,y r denote the eigenvalues of Y. F satisfies the 



following partial differential equations: 




1 1 ^ 

- + 9 £ 



2 .*r-!,.yi 



8,F 



y.j 



1 r 

- y 



y.j 



v., 



d,F 



I,..., p. 



Muirheadl |1970| obtained these partial dif ferentia l equa ti ons from the part ial differential 



equations satisfied by zonal polynomials (jjamesl 1968J, Takemural . Il984l Sec.4.5]). In 
Appendix |A] we check that for low dimensional cases these equations are also derived 
from the differential operators in Lemma HI 
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3.1.3 Practice of HGD 

Although the HGD is a general method which can be applied to broad problems, we need 
a good guess (oracle) of a starting point to search the optimal point (MLE). We explain 
why we need a good guess of a starting point with an example of V 2 (1R 3 ). Let G be the 
optimal point for a given data and = Pij(9)Q be the Pfaffian system to apply for the 
HGD. The denominator of the coefficient matrix P^- is a polynomial in 9. The Figured] 
shows the zero set in the 9n, 9yi space of the product of the polynomials standing for Pn 
and P12 when 9ij,i = 2, 3 is restricted to the constant (Q 1:2) ij, which is the MLE for the 
comets data (Section |4.2p . 




Similarly, in the case of 50(3), the Figure [3] shows the zero set in the 9n,9i 2 space 
of the product of the polynomials standing for the Pfaffian system when {9^ \ 7^ 
(1, 1), (1, 2)} is restricted to the MLE for the comets data (Section 14. 2p . 



































J 

h 






\/ 



Figure 3: Singular locus in 9n,9 12 space for 50(3) 

The numerical integration procedure of the Pfaffian system becomes unstable near the 
zero set of the product of the polynomials, which is called the singular locus of the Pfaffian 
system. Therefore, the starting point must be in the same component with the optimal 
point in the semi-algebraic set defined by the zero set. In our current implementation of 
HGD, we find the starting point by preparing a table of the values of the normalization 
constant (integral) at grids and making the exhaustive search of the optimal point on the 
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grids or by using an approximate MLE obtained by an other method. Note that the table 
can be used not only for specific data but also for general data. 

6.Z series expansion approach for 50(3) and V 2 (R 3 ) 

We describe a method to compute the maximum likelihood estimate by an infinite series 
expansion of c(0). By Lemma |2j computation of the maximum likelihood estimate for 
SO(p) is reduced to computation of c(diag(0i, . . . , P )) and its derivatives with respect 
to 0j's, together with the usual gradient method. In this subsection we give an explicit 
series expansion of c(diag(0i, 02, 03)) when p = 3. Note that c(0) for any G e M 3x3 is 
also obtained via sign-preserving SVD due to the rotational invariance of c(G). By using 
the expansion formula we also clarify the difference between the normalizing constants for 
the orthogonal group 0(3) and the special orthogonal group 50(3). The series expansion 
approach for V^M 3 ) is also discussed. 

From mathematical viewpoint, the holonomic descent and the infinite series expansion 
is related as follows. In the general recipe of the holonomic gradient descent and holonomic 
systems, we can construct series expansion of the normalization constant c(G) for any p up 
to any degree modulo finite constants in an algorithmic method from a holonomic system 
of differential equations satisfied by c(Q), which is obtained in the previous subsection. 
The existence of finite recurrence relations for coefficients of the series is proved by the 
holonomicity. This is a multi- variable generalization of the fact that coefficients of series 
solutions of linear ODE satisfy a finite recurrence relation. Since this computation requires 
huge computational resources, constructing the series expansion in a more efficient way 
is preferable to using the general algorithm. Here we derive an infinite series expansion 
for SO (3) with an analysis of integrals. 

Let E[-] denote the expectation with respect to the uniform distribution on S0(3). 
Let 01,02,03 be the sign-preserving singular values of G. By the rotational invariance, 
the expansion of c(G) is 

oo 1 oo 1 

c ( ) = E ^m?® T x) h ] = E xi^^ 11 + ^ 2X22 + 

h=0 ' h=0 

oo -. 

= E 1 ^^mA^\ (9) 

k,l,m=0 

and the problem is reduced to the evaluation of 

E(k,l,m) = E[x k n x l 22 x™}. 

Again by the rotational invariance we can simultaneously change the sign of any two of 
^115^225 ^33- From this it is easily seen that E(k,l,m) = unless k,l,m are all even or 
k, I, m are all odd. 

Note that for 0(3) we can individually change the signs of in, £225 ^33 • Hence for 0(3) 
E(k, l,m) = unless k, I, m are all even and c(Q) is indeed a function of the eigenvalues 
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of Y — T 0/4. Therefore the difference between c(0) for SO(3) and c(0) for 0(3) comes 
from terms E[k, l,m] =0 with k, I, m all odd. 

We now express X = (xij) G 5*0(3) by the Euler angles 9, 0, if). 



/ six 



X 



> sin tj) + cos 9 sin cos ip — cos cos ip + cos sin sin ^\ 



sin cos — sin sin tp + cos 6* cos cos -0 sin cos -0 + cos 9 cos sin -0 

\cos9 — sin # cos "0 —sin 6* sin?/; 

The Jacobian of the above transformation is sin 9 and the range of variables is 

< 9 < 7T, < < 2vr, < V < 2vr. 

Hence the integral of / over 50(3) with respect to the uniform probability measure is 
expressed as 



f{X)dn{X) 



1 



SO{3) 



57T 



f27T /*2"7T 

dcf) / dip f(X(6,(f),ip)) sin9. 



o 



o 



For 



/ = x^x^x^ = (sin#sin0) fc (— sin 0sin0> + cos # cos cos V?)^— sin9 sinip) 1 
we have 



/ • sin 9 = (-l) m sin fe+m+i 9 sin fe sin" 1 V 



n=0 



n=0 



Define 



J2 (Jj (-1)" sin n sin™ 0> cos'"™ cos'"™ cos'"™ ^ 

l) m+n sin fc+m+1 9 cos'"™ sin fc+n cos'-™ sin m+n ^ cos'"™ if). 

. (m-l)H(n-l)!! 
i [m, nj = - 



(m + n)H 

where (2a)!! = n^=i(2j) an d (2a — 1)!! = rij=i(2j ~~ 1) f° r each non- negative integer a. 
Then from well-known results on the definite integrals of trigonometric functions we have 



E(k,l,m)= (TK 

0<n<I ^ ' 



+ m + 1, I - n\ ■ I[k + n, I - n] ■ I[m + n,l-n]. (10) 



By numerical experiments we found that f fTOj) can be computed easily and we can evaluate 
c(0) by the right-hand side of © to a desired accuracy. For large k, I, m the value of 
E(k, I, m) can be approximated by Laplace's method. Laplace approximation to E(k, I, m) 
is given in Appendix [Bj 
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We now consider the maximization of (j3J) with respect to {0j} 3 =1 when we adopt 
direct use of the gradient descent as in Figured] (b). The gradient method uses the first 
derivatives of (j3J). The Hessian matrix is also needed if one uses the Newton method. 
Since the first term of (jlj) is linear, it is sufficient to give the series expansion of the 
derivatives of c(diag(0i, 02> 03))- They are easily obtained from the expansion of c(O). 
For example the derivative with respect to 0i is 

9c(diag(0 1 ,0 2 ,0 3 )) ^ 1 , , m , > 

^ = 2. M7I^^ 3E(A; + 1 ' / ' m) - 

r k,l,m=0 

Similarly, 



^ L k,l,m=0 

(9 2 c(diagpi,0 2 ,03)) _i Ak.l.m-Ffi,, , , ■, \ 

k,l,m=0 

Finally we note that the series expansion of c(O) for 5*0(3) is directly used for the max- 
imum likelihood estimate of V^M 3 ). Let X 1:2 be the first two columns of the averaged data 
matrix X e M 3x3 . Let X 1:2 = Q diag(gi, g2)R be the (usual) SVD. Then, as stated before 
LemmaEl the maximum likelihood estimator for V^M 3 ) is given by = Q diag(0i, 4>2)R, 
where (4>j) is the maximizer of 



2 / . 2 \ 2 

y2 4>k9k - log / exp( (j) k x hk )n{dX) = V] 4>k9k - log c(diag(0i, 2 , 0)) 
k=i \Jv^) k=1 J k=1 

in terms of c(0) for 50(3) . Then the MLE is obtained via the series expansion of c(0). 



4 Application to data on orbits of near-earth objects 



In this section as an illustration of the above discussion, we fit Fisher distributions of 
50(3) and V^M 3 ) to data of orbits of near-earth objects. We obtained the data from the 
web page of Near Earth Object Program of National Aeronautics and Space Administra- 
tion (cf. http : / /neo . jpl .nasa. g ov/ cgi-bin/neo_elem). Near-earth objects are comets 



Jupp and Mardial [19791 ] fitted Fisher distribution on 



and asteroids around the Earth. 

3 ) to data of c omets from iMarsdenl 19721 ]. but did not consider Fisher distribution 



V 2 

on 50(3). See also iMardial 19751 ] for analysis of data of perihelion direction. 

The near-earth objects have ellipsoidal orbits with the Sun as their focus. The orbits 
are characterized by the following two directions: 

1. the perihelion direction x±, which is the direction of the closest point on the orbit 
from the Sun. 
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2. the normal direction X2 to the orbit, which is determined by the right-hand rule for 
the rotation of the object. 

The pair (tci,^) is an element of V^M 3 ). We can also define x% = X\ x X2 such that 
(x±, X2, X3) is an element of SO(3). 



1 




O: the Sun 



Figure 4: Orbits of near-earth objects 

We analyzed 151 comets and 6496 asteroids separately. To obtain a meaningful result, 
we identified a tight cluster of 67 similar comets, which we treated as one comet, and 
therefore the actual sample size of comets is N = 85. Parts of the data are shown in 
Table [Hand Table El As discussed in Section [2] we can analyze the data either on V^M 3 ) 
or on 5*0(3). 



Table 1: Part of data on 85 comets around the Earth (cci, X2, X3). 



index 


object 




X-2 


a=3 


1 


lP/Halley 


( 0.527,-0.304, 0.794) 


( 0.010,-0.931,-0.363) 


( 0.849, 0.199,-0.488) 


2 


2P/Encke 


( 0.901, 0.431, 0.048) 


(-0.001, 0.113,-0.994) 


(-0.434, 0.895, 0.102) 


3 


3D/Biela 


(-0.341, 0.700, 0.628) 


(-0.010, 0.665,-0.747) 


(-0.940,-0.261,-0.220) 


4 


5D/Brorsen 


(-0.235, 0.939, 0.250) 


( 0.003,-0.257, 0.966) 


( 0.972, 0.227, 0.058) 












85 


P/2009 L2 (Yang-Gao) 


(-0.164,-0.961, 0.221) 


(-0.005, 0.225, 0.974) 


(-0.986, 0.159,-0.042) 


mean 




( 0.115, 0.113, 0.022) 


( 0.001,-0.102,0.038) 


( 0.140,-0.233,-0.091) 



4.1 The test of uniformity based on Rayleigh's statistic 

As a preliminary analysis we test whether the orbits of the comets and asteroids are 
uniformly distributed over V^IR 3 ) or SO (3). 

We first recall the Rayleigh's statistic for Stiefel manifolds. Let x 1:r be the sample 
mean matrix of a data set on V r (M. p ) and N be the sample size. Under the null hypothesis 
of uniformity over V r (M. p ) the Rayleigh statistic 

S 1:r =pN-tT(xl r x 1:r ) (11) 
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Table 2: Part of data on 6496 asteroids (xi, x 2 , x 3 ). 



index 


object 




x 2 


x 3 


1 


433 Eros 


(-0.548, 0.837, 0.004) 


(-0.155,-0.110, 0.982) 


( 0.822, 0.538, 0.187) 


2 


719 Albert 


( 0.939,-0.340, 0.082) 


(-0.014, 0.201, 0.980) 


(-0.340,-0.920, -0.183) 


3 


887 Alinda 


(-0.191, 0.981,-0.030) 


( 0.153. 0.057, 0.987) 


( 0.970, 0.184,-0.160) 


4 


1036 Ganymed 


( 0.933,-0.140. 0.331) 


(-0.262, 0.365, 0.893) 


(-0.250,-0.920, 0.304) 












6496 


(6344 P L) 


( 0.536, 0.842,-0.070) 


(-0.005, 0.082, 0.997) 


( 0.844,-0.530, 0.048) 


mean 




( 0.074, 0.018,-0.000) 


( 0.012. 0.003, 0.949) 


( 0.016,-0.070. 0.002) 



is asymptotically distributed according to the chi-square distribution with rp degrees of 
freedom. Similarly we can define the Rayleigh statistic for the special orthogonal group. 
Let x be the sample mean matrix of a data set on SO(p) and N be the sample size. Under 
the null hypothesis of uniformity over SO(p), the Rayleigh statistic 

S = pN ■ tr(x T x) (12) 

is asymptotically distributed according to the chi-square distribution with p 2 degrees of 
freedom (see Remark [T]). 

From Table (TJ the sample mean matrix of comets' data is calculated as 




0.257 0.044 0.189\ 
0.158 -0.052 -0.146 
0.079 0.765 0.004/ 



Since the (3, 2) element of x is large, the orbital plane of the comets are typically close 
to that of the Earth. Let xi :2 be the first two columns of x. The Rayleigh statistic (fTTT) 
for V 2 (M. 3 ) is 

S 1:2 = 3 ■ 85 • tr(xl 2 x 1:2 ) = 175.2 
with the p- value almost zero. The Rayleigh statistic ()12p for SO (3) is 

S = 3- 85 -tr(^ T ^) = 189.8 

with the p- value almost zero. 

Similarly for asteroids data in Table |2] the sample mean matrix is given as 

/ 0.074 0.012 0.016\ 
x = 0.018 0.003 -0.070 
\-0.000 0.949 0.002/ 

and the null hypothesis of uniformity is rejected by both 

Si :2 = 3 • 6496 ■ ti(xl 2 x 1:2 ) = 1.77 x 10 4 

and 

S = 3- 6496 • ti(x T x) = 1.78 x 10 4 . 
The p-values are almost zero. 
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4.2 Maximum likelihood estimate of Fisher distributions 



) is 




0.689 


0.341 


0.394 


-0.229 


0.496 


4.273 



We compute the MLE (maximum likelihood estimate) of the Fisher distribution on V^IR 3 ) 
and 5*0(3) by using the two methods described in Section [3j For clarity we denote the 
parameter of the Fisher distribution on V^M 3 ) and 50(3) by Qx-.2 and 0, respectively. 

First we compute the MLE by the holonomic gradient descent with solving numerically 
the associated dynamical system along gradient directions. We add a superscript (h) as 
qw f or values computed by the holonomic gradient descent. For the comets' data the 
MLE of the Fisher distribution on V^( 



e (h) 

U l:2 



/0.1 0.1 

The starting point is I 0.1 —0.1 | which is found by the exhaustive search of the optimal 

\0.1 5.1, 

point on the grids with = ±0.1, ±5.1 (Section l3.1.3p . We apply the HGD and obtain an 
optimal point and apply it again to correct numerical errors and obtain the MLE stated 

/ (0.1,1.0) (0.1,1.0) \ 
first. The search domain of the HGD is I (0.1, 1.0) (—1, —0.1) . Here the (1, l)-entry 

V (0.1,1.0) (0.1,10.0) / 
(0.1, 1.0) of the search domain means that the variable 9u is confined in the interval 
(0.1, 1.0) during the gradient descent. Other entries stand for corresponding variables. 
The MLE of the Fisher distribution on 5*0(3) is 

2.953 0.200 0.871 N 
6)( h ) = ( -0.423 -0.317 2.390 
0.378 5.566 0.251, 

Since the MLE is near to the singular locus of the Pfaffian system in case of 50(3), the 
grid method (Section 13.1. 3p to find a starting point does not work well. We use the MLE 
obtained by the series expansion method below as the starting point. In this case, the 
HGD is used to confirm and make the answer by an other method more precise. 

For the asteroid data, the difference scheme of the dynamical system is instable and we 
cannot find the MLE even when we start from the MLE obtained by the series expansion. 

We next compute the MLE (maximum likelihood estimate) of the Fisher distribution 
on V 2 (1R 3 ) and 50(3) by using the series expansion approach. We add a superscript (s) 
as for values computed by this method. For the comets' data the MLE of the Fisher 
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distribution on V^IR 3 ) and its SVD are 



e (s) 

U l:2 



0.689 


0.341\ 


0.394 


-0.229 


0.496 


4.273/ 


0.098 


0.835 


-0.041 


0.547 


0.994 


-0.060 




0.126 0.992 
0.992 -0.126 



The MLE of the Fisher distribution on SO (3) and its sign-preserving SVD are 





( 2.953 


0.200 


0.871\ 




-0.423 


-0.317 


2.390 




^ 0.378 


5.566 


0.251/ 




f -0.109 


0.964 


0.242 


-\ 


0.048 


0.248 


-0.968 




^-0.993 


-0.093 


-0.073 




0.128 0.991 0.041 
0.879 -0.132 0.458 
0.459 -0.023 -0.888, 



Note that det x > but det 6 < 0. 
For the asteroid data the MLEs are 



'0.157 0.254 

eg, = [ 0.038 0.060 

^0.005 19.568 

'0.013 0.972 

0.003 0.235 

1.000 -0.013 




19.570 \ /0.000 1.000 
0.161 J 1 1.000 -0.000 



and 

^0.001 19.601 0.056/ 

'19.603 0.000 0.000\ / 0.000 1.000 0.002 N 

0.000 0.908 0.000 -0.855 -0.001 0.518 

0.000 0.000 0.747/ \-0.518 0.002 -0.855, 

The AIC values are given in Table |3j For each data, AIC was minimized by the Fisher 
distribution on SO (3). 



0, 


291 


0.257 


-0.781X 


0. 


.817 


0.056 


0.134 


0. 


001 


19.601 


0.056/ 


0. 


013 


-0.721 


0.693 N 


0. 


003 


-0.693 


-0.721 




000 


0.011 


-0.007, 
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Table 3: AIC of each data and each model. 







comets 






asteroids 






Uniform 


V 2 (R 3 ) 


SO{3) 


Uniform 




SO(3) 


AIC 





-207.0 


-219.7 





-3.47 x 10 4 


-3.48 x 10 4 



A Partial differential equation for qFi(j)/2,Y) 

If G = diag(^jj) is diagonal, then yi = 0^/4. By change of variables from (jHJ) we have 



p_ r _^l + 1 _ y _J^U_I y _w_a,. 

2 2 2 2 ^-y> 



= %i + (P~ r)^d u + y — !_ {dA - u i) u } - 1. (13) 

We now show that the numerator of (fT3]) belongs Jdiag for small dimensions. 

For p = r = 2 (i.e. for 0(2)) by Macaulay2 we checked that the above I is holonomic. 
Also by asir (nk_restriction) , a set of generators of /diag is given as 

hi = (-02 2 + #nH 4 i + 60110?! + (20 2 2 2 - 29 2 u + 6)d 2 n - 60110H - 2 22 + 0^ - 3, 



^2 = (022 — 011 ) 011 — 011011 + 022022 ~ 022 + ^ 
54 . /) a o3 1 /qo /) \o2 



111 



^3 = 0220H + 0110220H + (3022 ~ 022)011 — 011022011 ~ 20 22 , 
h A = 0H022011 + (011022 — 022)011 + ( — 011 — 1)022 + 022, 

h 5 = -0n + 022- 
Looking at h 2 and h 5 we have 

h - 2 \ I f) 2 1 jjllfjll ~ ^22022 
- (# 2 2 - ^llJ yj\l H g 2 _g2 

^2 . , I o2 , 022022 _ 011011 

+ /i 5 = ^ 9 n + 



0l2 - 011 5 ' I ^ 0l2 " 0?1 

These are the same as (fT3|) for p = r = 2. 

For the case of V^M 3 ) (p = 3, r = 2) by Macaulay2 we have checked that / is holonomic 
By asir (nk_restriction) Jdiag has the set of generators: 

hi = —011022011 + ( — 022022 — 3022 + 022)011 + 011022, 

h 2 = 011022011 + 022011 — 011022022 — 011022, 

^3 = 011011 + 20110H — 022022 — 2022 022 + 022 ~~ 011, 

/l 4 = —01101! + (022022 + 2#22022 — 022 — 1)011 + 011022022 + ^11^22 ~ 011022 022, 
h 5 = ( — 011022022 — 011022 + 011022)011 ~ 022 022 ~~ 40 22 022 + (022 — 2) 22 + 20 22 , 
h& = —011022011 + (022 — 011022)022 + (2022 — 011 ) 022 ~ 022 + 011022- 
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Looking at he 



—#11022^11 + (022 — Q\\®Tl)<$n + (2^22 — ^ll)^22 ~ #22 + #11^ 
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we see that it coincides with the case ofp = 3,r = 2,z = 2in (fl3]) . 

B Asymptotic evaluation of ^(/c, /, m) 

We derive an asymptotic form of E(k,l,m) when k,l,m simultaneously go to infinity. 
Let k = na, I = nj3 and m — 717 where a, /3 and 7 are fixed positive numbers. We use 
Laplace's method to show 



as n — > 00. The integrand ijjj^i^ of -E(/c, /, m) is maximized at four points (xn, X22, ^33) = 
(1, 1, 1), (—1, —1,1), (—1,1, —1) and (1, —1, —1) as long as k, I, m are all even or all odd. 
By symmetry it is sufficient to consider the neighborhood of diag(l,l,l), where X is 
approximated by 



with sufficiently small numbers €1,62,63. The density of (61,62,63) with respect to the 
Lebesgue measure de\de2de 3 is l/Vol(S'0(3)) = l/(87r 2 ). Hence we obtain 



We have checked that the right-hand side gives a good approximation to the exact value 
of E(k, l,m) for k + l + m> 100. 

The same argument shows that for SO(p) 




(14) 





eiy/\l-4-4) m/2 ^2 d ^de 3 




as n —)• 00 when fcj = ncij, > 0, and fcj's are are all even or all odd. 
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